可进入的模型可以通过在表示理论和特征领域的语言中制定均衡性要求来提供非常通用和灵活的均衡性,这对许多视觉任务都是有效的。但是,由于3D旋转的数学更复杂,因此在2D情况下得出3D旋转模型要困难得多。在这项工作中,我们采用部分差分运算符(PDOS)来模型3D滤波器,并得出了通用的可检测3D CNN,称为PDO-S3DCNNS。我们证明,模棱两可的过滤器受线性约束的约束,可以在各种条件下有效地解决。据我们所知,PDO-S3DCNNS是3D旋转的最通用的CNN,因为它们涵盖了所有$ SO(3)$及其表示的所有常见子组,而现有方法只能应用于特定的组和特定组和表示。广泛的实验表明,我们的模型可以很好地保留在离散域中的均衡性,并且在SHREC'17检索和ISBI 2012分割任务上的表现都超过了以前的网络复杂性。
translated by 谷歌翻译
深层神经网络能够轻松地使用软磁横层(CE)丢失来记住嘈杂的标签。先前的研究试图解决此问题的重点是将噪声损失函数纳入CE损失。但是,记忆问题得到了缓解,但仍然由于非持鲁棒的损失而造成的。为了解决这个问题,我们专注于学习可靠的对比度表示数据,分类器很难记住CE损失下的标签噪声。我们提出了一种新颖的对比正则化函数,以通过标签噪声不主导表示表示的嘈杂数据来学习此类表示。通过理论上研究由提议的正则化功能引起的表示形式,我们揭示了学识渊博的表示形式将信息保留与真实标签和丢弃与损坏标签相关的信息有关的信息。此外,我们的理论结果还表明,学到的表示形式对标签噪声是可靠的。通过基准数据集的实验证明了该方法的有效性。
translated by 谷歌翻译
通过使用图像级分类掩模监督其学习过程,弱监督对象本地化(WSOL)放宽对对象本地化的密度注释的要求。然而,当前的WSOL方法遭受背景位置的过度激活,并且需要后处理以获得定位掩模。本文将这些问题归因于背景提示的不明显,并提出了背景感知分类激活映射(B-CAM),以便仅使用图像级标签同时学习对象和背景的本地化分数。在我们的B-CAM中,两个图像级功能,由潜在背景和对象位置的像素级别功能聚合,用于从对象相关的背景中净化对象功能,并表示纯背景样本的功能,分别。然后基于这两个特征,学习对象分类器和背景分类器,以确定二进制对象本地化掩码。我们的B-CAM可以基于提出的错开分类损失以端到端的方式培训,这不仅可以改善对象本地化,而且还抑制了背景激活。实验表明,我们的B-CAM在Cub-200,OpenImages和VOC2012数据集上优于一级WSOL方法。
translated by 谷歌翻译
半监控视频动作识别倾向于使深神经网络能够实现显着性能,即使具有非常有限的标记数据。然而,现有方法主要从当前的基于图像的方法转移(例如,FixMatch)。不具体利用时间动态和固有的多模式属性,它们的结果可能是次优。为了更好地利用视频中的编码的时间信息,我们将时间梯度引入了本文中的更多细小特征提取的额外模态。具体而言,我们的方法明确地蒸馏从时间梯度(TG)的细粒度运动表示,并施加不同方式的一致性(即RGB和TG)。在推理期间,没有额外的计算或参数,在没有额外的计算或参数的情况下显着提高了半监督动作识别的性能。我们的方法在若干典型的半监督设置(即标记数据的不同比率)下实现三个视频动作识别基准(即动态-400,UCF-101和HMDB-51)的最先进的性能。
translated by 谷歌翻译
Cloth in the real world is often crumpled, self-occluded, or folded in on itself such that key regions, such as corners, are not directly graspable, making manipulation difficult. We propose a system that leverages visual and tactile perception to unfold the cloth via grasping and sliding on edges. By doing so, the robot is able to grasp two adjacent corners, enabling subsequent manipulation tasks like folding or hanging. As components of this system, we develop tactile perception networks that classify whether an edge is grasped and estimate the pose of the edge. We use the edge classification network to supervise a visuotactile edge grasp affordance network that can grasp edges with a 90% success rate. Once an edge is grasped, we demonstrate that the robot can slide along the cloth to the adjacent corner using tactile pose estimation/control in real time. See http://nehasunil.com/visuotactile/visuotactile.html for videos.
translated by 谷歌翻译
Abstractive summarization is the process of generating a summary given a document as input. Although significant progress has been made, the factual inconsistency between the document and the generated summary still limits its practical applications. Previous work found that the probabilities assigned by the generation model reflect its preferences for the generated summary, including the preference for factual consistency, and the preference for the language or knowledge prior as well. To separate the preference for factual consistency, we propose an unsupervised framework named CoP by controlling the preference of the generation model with the help of prompt. More specifically, the framework performs an extra inference step in which a text prompt is introduced as an additional input. In this way, another preference is described by the generation probability of this extra inference process. The difference between the above two preferences, i.e. the difference between the probabilities, could be used as measurements for detecting factual inconsistencies. Interestingly, we found that with the properly designed prompt, our framework could evaluate specific preferences and serve as measurements for fine-grained categories of inconsistency, such as entity-related inconsistency, coreference-related inconsistency, etc. Moreover, our framework could also be extended to the supervised setting to learn better prompt from the labeled data as well. Experiments show that our framework achieves new SOTA results on three factual inconsistency detection tasks.
translated by 谷歌翻译
People constantly use language to learn about the world. Computational linguists have capitalized on this fact to build large language models (LLMs) that acquire co-occurrence-based knowledge from language corpora. LLMs achieve impressive performance on many tasks, but the robustness of their world knowledge has been questioned. Here, we ask: do LLMs acquire generalized knowledge about real-world events? Using curated sets of minimal sentence pairs (n=1215), we tested whether LLMs are more likely to generate plausible event descriptions compared to their implausible counterparts. We found that LLMs systematically distinguish possible and impossible events (The teacher bought the laptop vs. The laptop bought the teacher) but fall short of human performance when distinguishing likely and unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLMs generalize well across syntactic sentence variants (active vs passive) but less well across semantic sentence variants (synonymous sentences), (iii) some, but not all LLM deviations from ground-truth labels align with crowdsourced human judgments, and (iv) explicit event plausibility information emerges in middle LLM layers and remains high thereafter. Overall, our analyses reveal a gap in LLMs' event knowledge, highlighting their limitations as generalized knowledge bases. We conclude by speculating that the differential performance on impossible vs. unlikely events is not a temporary setback but an inherent property of LLMs, reflecting a fundamental difference between linguistic knowledge and world knowledge in intelligent systems.
translated by 谷歌翻译
As the COVID-19 pandemic puts pressure on healthcare systems worldwide, the computed tomography image based AI diagnostic system has become a sustainable solution for early diagnosis. However, the model-wise vulnerability under adversarial perturbation hinders its deployment in practical situation. The existing adversarial training strategies are difficult to generalized into medical imaging field challenged by complex medical texture features. To overcome this challenge, we propose a Contour Attention Preserving (CAP) method based on lung cavity edge extraction. The contour prior features are injected to attention layer via a parameter regularization and we optimize the robust empirical risk with hybrid distance metric. We then introduce a new cross-nation CT scan dataset to evaluate the generalization capability of the adversarial robustness under distribution shift. Experimental results indicate that the proposed method achieves state-of-the-art performance in multiple adversarial defense and generalization tasks. The code and dataset are available at https://github.com/Quinn777/CAP.
translated by 谷歌翻译
我们提出了一种新型的复发图网络(RGN)方法,用于通过学习潜在的复杂随机过程来预测离散标记的事件序列。使用点过程的框架,我们将标记的离散事件序列解释为各种唯一类型的不同序列的叠加。图网络的节点使用LSTM来合并过去的信息,而图形注意力网络(GAT网络)引入了强烈的电感偏见,以捕获这些不同类型的事件之间的相互作用。通过更改自我注意力的机制从过去的事件中参加活动,我们可以从$ \ MATHCAL {O}(n^2)$(事件总数)到$ \ Mathcal的时间和空间复杂性降低{o}(| \ Mathcal {y} |^2)$(事件类型的数量)。实验表明,与最新的基于最新的变压器架构相比,所提出的方法可以提高对数可能具有较低时间和空间复杂性的对数可能具有较低时间和空间复杂性的任务的性能。
translated by 谷歌翻译
预处理的变形金刚记住事实知识的能力对于下游任务(例如封闭式问题答案)是必不可少的。现有的工作表明,经过审计的变压器可以回忆或利用在某种程度上出现的训练训练阶段中出现的事实知识。但是,由于模型能力的限制,预审预周仔的记忆知识的能力也受到限制。 Dai等。 (2022)发现经过验证的变形金刚中的馈电网络(FFN)以内存的方式存储事实知识。受这一发现的启发,我们提出了一个神经知识库(NKB),以存储预验证的变压器的额外事实知识。要具体而言,我们还将FFN视为键值记忆,并使用其他内存插槽扩展它们。在知识注入期间,我们将原始模型和事实知识注入扩展的存储插槽中,因此预验证的模型不会遗忘。此外,FFN作为钥匙值记忆的观点使NKB高度可解释。我们使用三个封闭式问题回答数据集来显示我们强大的存储额外事实知识的能力。另外,我们证明NKB不会通过两种代表性生成任务,摘要和机器翻译来降低验证模型的一般语言生成能力。此外,我们彻底分析了NKB以揭示其工作机制,并以人为可读的方式介绍其钥匙和价值观的含义。最重要的是,我们执行初步尝试,以直接更新NKB中的事实知识,而无需任何其他培训。
translated by 谷歌翻译